Search CORE

486 research outputs found

Detecting Outliers in Data with Correlated Measures

Author: Kifer Daniel
Kuo Yu-Hsuan
Li Zhenhui
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 26/08/2018
Field of study

Advances in sensor technology have enabled the collection of large-scale datasets. Such datasets can be extremely noisy and often contain a significant amount of outliers that result from sensor malfunction or human operation faults. In order to utilize such data for real-world applications, it is critical to detect outliers so that models built from these datasets will not be skewed by outliers. In this paper, we propose a new outlier detection method that utilizes the correlations in the data (e.g., taxi trip distance vs. trip time). Different from existing outlier detection methods, we build a robust regression model that explicitly models the outliers and detects outliers simultaneously with the model fitting. We validate our approach on real-world datasets against methods specifically designed for each dataset as well as the state of the art outlier detectors. Our outlier detection method achieves better performances, demonstrating the robustness and generality of our method. Last, we report interesting case studies on some outliers that result from atypical events.Comment: 10 page

arXiv.org e-Print Archive

Crossref

A Simple Baseline for Travel Time Estimation using Large-Scale Trip Data

Author: Kifer Dan
Kuo Yu-Hsuan
Li Zhenhui
Wang Hongjian
Publication venue
Publication date: 28/12/2015
Field of study

The increased availability of large-scale trajectory data around the world provides rich information for the study of urban dynamics. For example, New York City Taxi Limousine Commission regularly releases source-destination information about trips in the taxis they regulate. Taxi data provide information about traffic patterns, and thus enable the study of urban flow -- what will traffic between two locations look like at a certain date and time in the future? Existing big data methods try to outdo each other in terms of complexity and algorithmic sophistication. In the spirit of "big data beats algorithms", we present a very simple baseline which outperforms state-of-the-art approaches, including Bing Maps and Baidu Maps (whose APIs permit large scale experimentation). Such a travel time estimation baseline has several important uses, such as navigation (fast travel time estimates can serve as approximate heuristics for A search variants for path finding) and trip planning (which uses operating hours for popular destinations along with travel time estimates to create an itinerary).Comment: 12 page

arXiv.org e-Print Archive

Crossref

Planck Constraints on Holographic Dark Energy

Author: Li Miao
Li Xiao-Dong
Ma Yin-Zhe
Zhang Xin
Zhang Zhenhui
Publication venue: 'IOP Publishing'
Publication date: 01/01/2013
Field of study

We perform a detailed investigation on the cosmological constraints on the holographic dark energy (HDE) model by using the Planck data. HDE can provide a good fit to Planck high-l (l>40) temperature power spectrum, while the discrepancy at l=20-40 found in LCDM remains unsolved in HDE. The Planck data alone can lead to strong and reliable constraint on the HDE parameter c. At 68% CL, we get c=0.508+-0.207 with Planck+WP+lensing, favoring the present phantom HDE at > 2sigma CL. Comparably, by using WMAP9 alone we cannot get interesting constraint on c. By combining Planck+WP with the BAO measurements from 6dFGS+SDSS DR7(R)+BOSS DR9, the H0 measurement from HST, the SNLS3 and Union2.1 SNIa data sets, we get 68% CL constraints c=0.484+-0.070, 0.474+-0.049, 0.594+-0.051 and 0.642+-0.066. Constraints can be improved by 2%-15% if we further add the Planck lensing data. Compared with the WMAP9 results, the Planck results reduce the error by 30%-60%, and prefer a phantom-like HDE at higher CL. We find no evident tension between Planck and BAO/HST. Especially, the strong correlation between Omegam h^3 and dark energy parameters is helpful in relieving the tension between Planck and HST. The residual chi^2_{Planck+WP+HST}-chi^2_{Planck+WP} is 7.8 in LCDM, and is reduced to 1.0 or 0.3 if we switch dark energy to the w model or the holographic model. We find SNLS3 is in tension with all other data sets; for Planck+WP, WMAP9 and BAO+HST, the corresponding Delta chi^2 is 6.4, 3.5 and 4.1, respectively. Comparably, Union2.1 is consistent with these data sets, but the combination Union2.1+BAO+HST is in tension with Planck+WP+lensing, corresponding to a Delta chi^2 8.6 (1.4% probability). Thus, it is not reasonable to perform an all-combined (CMB+SNIa+BAO+HST) analysis for HDE when using the Planck data. Our tightest self-consistent constraint is c=0.495+-0.039 obtained from Planck+WP+BAO+HST+lensing.Comment: 29 pages, 11 figures, 3 tables; version accepted for publication in JCA

arXiv.org e-Print Archive